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Abstract 

This paper considers the problem of adaptive estimation of a mean pattern in a randomly 
shifted curve model. We show that this problem can be transformed into a linear inverse 
problem, where the density of the random shifts plays the role of a convolution operator. An 
adaptive estimator of the mean pattern, based on wavelet thresholding is proposed. We study 
its consistency for the quadratic risk as the number of observed curves tends to infinity, and 
this estimator is shown to achieve a near-minimax rate of convergence over a large class of 
Besov balls. This rate depends both on the smoothness of the common shape of the curves 
and on the decay of the Fourier coefficients of the density of the random shifts. Hence, 
this paper makes a connection between mean pattern estimation and the statistical analysis 
of linear inverse problems, which is a new point of view on curve registration and image 
warping problems. Some numerical experiments are given to illustrate the performances of 
our approach and to compare them with another algorithm existing in the literature. 

Keywords: Mean pattern estimation, Curve registration, Inverse problem, Deconvolution, Meyer 
wavelets, Adaptive estimation, Besov space, Minimax rate. 
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1 Introduction 

1.1 Model and objectives 

Assume that we observe realizations of n noisy and randomly shifted curves Y\, . . . ,Y n coming 
from the following Gaussian white noise model 

dY m (x) = f(x - T m )dx + e m dW m (x), x G [0, 1], m = 1, . . . ,n (1.1) 

where / is the unknown mean pattern (also called common shape) of the curves, W m are inde- 
pendent standard Brownian motions on [0, 1], the e m 's represent levels of noise which may vary 
from curve to curve, and the unknown random shifts independent of the W m 's, The 

aim of this paper is to study some statistical aspects related to the problem of estimating /. 

This model is realistic in many situations where it is reasonable to assume that the observed 
curves represent replications of almost the same process and when a large source of variation in 
the experiments is due to transformations of the time axis. Such a model is commonly used in 
many applied areas dealing with functional data such as neuroscience [16] or biology [31]. More 
generally, the model (11.11) represents a kind of benchmark model for studying the difficult problem 
of recovering a mean pattern from a set of similar curves or images in the presence of random 
deformations and additive noise, which corresponds to the general setting of Grenander's theory 



of shapes [13]. The results derived in this paper show that the model (11.1(1 . although simple, 
already provides some new insights on the statistical aspects of mean pattern estimation which 
is a problem frequently encountered in image warping, see p2] and the discussion therein for an 
overview. 

The function / : R — > M. is assumed to be periodic with period 1, and the shifts r m are 
supposed to be independent and identically distributed (iid) random variables with density 
g : R — > R + with respect to the Lebesgue measure dx on R. Our goal is to estimate non- 
parametrically the shape function / on [0, 1] as the number of curves n goes to infinity. 

Let L 2 ([0, 1]) be the space of squared integrable functions on [0,1] with respect to dx, and 
denote by ||/|| 2 = f \f(x)\ 2 dx the squared norm of a function /. Assume that T C L 2 ([0, 1]) 
represents some smoothness class of functions (e.g a Sobolev or a Besov ball), and let f n G 
L 2 ([0, 1]) be some estimator of the common shape /, i.e a measurable function of the random 
processes Y m , m = 1, . . . , n. For some / G F, the risk of the estimator f n is defined to be 

K(f n ,f) = E\\f n -f\\ 2 

where the above expectation E is taken with respect to the law of {Y m ,m = 1, . . . , n}. In this 
paper, we propose to investigate the optimality of an estimator by introducing the following 
minimax risk 

TZniF) = inf supTZ(f n ,f), 

where the above infimum is taken over the set of all possible estimators in model (jl.ip . One of 
the main contributions of this paper is to derive asymptotic lower and upper bounds for lZ n (J-) 
which, to the best of our knowledge, has not been considered before. 

Indeed, we show that there exists constants M\,M 2 , a sequence of reals r n = r^T) tending 
to infinity, and an estimator /* such that 

lim r n K n (T) > M 1 and lim r n sup TZ(f*, f) < M 2 . 

ri— >+oo n^+oo f&T 

However, the construction of /* may depend on unknown quantities such as the smoothness of 
/, and such estimates are therefore called non-adaptive. Since it is now recognized that wavelet 
decomposition is a powerful tool to derive adaptive estimators, see e.g [9], a second contribution 
of this paper is thus to propose wavelet-based estimators f n that attain a near-minimax rate of 
convergence in the sense there exits a constant M 2 such that 

lim (log n)~^r n suplZ(f n , f) < for some (3 > 0. 

n— >+oo f(Z -p 



1.2 Main result 

Minimax risks will be derived under particular smoothness assumptions on the density g. The 
main result of this paper is that the difficulty of estimating / is quantified by the decay to zero 
of the Fourier coefficients 7^ of the density g of the shifts defined as 

, \ r+00 

Ji = E (e- i2nlT \ = / e- i2nlx g{x)dx, (1.2) 

J — CO 

for f 6 Z. Depending how fast these Fourier coefficients tend to zero as \i\ — > +00, the recon- 
struction of / will be more or less accurate. This comes from the fact that the expected value of 
each observed process Y m (x) is given by 

/+00 
f(x-r)g{r)dT, for x G [0,1]. 
-00 
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This expected value is thus the convolution of / by the density g which makes the problem 
of estimating / an inverse problem whose degree of ill-posedness and associated minimax risk 
depend on the smoothness assumptions on g. 

This phenomenon is a well-known fact in deconvolution problems, see e.g [17J |27J, [28], and 
more generally for linear inverse problems as studied in [6]. In this paper, the following type of 
assumption on g is considered: 

Assumption 1.1 The Fourier coefficients of ' g have a polynomial decay i.e. for some real v > 0, 
there exist two constants C max > C m i n > such that C m i n \i\~ v < \~f(\ < C max \£\~ u for all i £ Z. 

In standard inverse problems such as deconvolution, the optimal rate of convergence we can 
expect from an arbitrary estimator typically depends on such smoothness assumptions. The 
parameter v is usually referred to as the degree of ill-posedness of the inverse problem, and it 
quantifies the difficult of inverting the convolution operator. The following theorem shows that a 
similar phenomenon holds for the minimax risk associated to model (jl.ip . Note that to simplify 
the presentation, all the theoretical results are given for the simple setting where the level of 
noise is the same for all curves i.e. e m = e for all m = 1, . . . , n and some e > 0. 

Theorem 1.1 Suppose that the smoothness class J- is a Besov ball Bp J A) of radius A > with 
p, q > 1 and smoothness parameter s > (a precise definition of Besov spaces will be given later 
on). Suppose that g satisfies Assumption \l.l\ and is such that there exist two constants C > 
and a > 1 satisfying g(x) < 1+ ( ^,| Q for all Let p' = min(2,p) and assume that s > 1/p' . 

If s > 2v + 1, then 

2s 

r n {T) = n 2B + 2 "+ 1 . 

Hence, Theorem [Tj] shows that under Assumption 1 1 . 1 1 the minimax rate r n is of polynomial order 
of the sample size n, and that this rate deteriorates as the degree of ill-posedness v increases. 
Such a behavior is well known for standard periodic deconvolution in the white noise model |17j . 
|27j . and Theorem [Tj] shows that a similar phenomenon holds for the model (jl.ljl . To the best of 
our knowledge, this is a new result which makes a connection between mean pattern estimation 
and the statistical analysis of deconvolution problems. 

1.3 Fourier Analysis and an inverse problem formulation 

Let us first remark that the model (jl.ip exhibit some similarities with periodic deconvolution 
in the white noise model as described in [17J. For x £ [0, 1], let us define the following density 
function 

G(x)=Y J 9(.x + k). (1.3) 

fcez 

Note that G{x) exists for all x £ [0, 1] provided g has a sufficiently fast decay at infinity. In 
particular, the condition that g(x) < 1+ ( ^,| Q for all x S M. and some a > 1 (see Assumption 13.11 
below) is sufficient to guarantee the existence of G. Since / is periodic with period 1, one has 

+oo rl 

f{x-T)g(r)dT= / f(x-r)G(r)dr, 

oo J 

and note that je = f*™ e~ l27Tex g(x)dx = e~ t2n£x G(x)dx. Hence, if one defines £ m (x) = 

f(x - r m ) - Jq f(x - r)G{T)dT and £(x) = ^Y,m=i^m(x), then taking the mean of the n 
equations in yields the model 

dYix) = [ f(x- T)G(r)dTdx + i{x)dx + -^dW(x), x G [0, 1], (1.4) 
Jo V n 

with e 2 = ^ Y^m=i e m an d where W(x) is a standard Brownian motion [0, 1]. 
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The model (|1.4p differs from the periodic deconvolution model investigated in [17] by the 
error term £(cc). Asymptotically is a Gaussian variable, so this suggests to use the wavelet 
thresholding procedures developed in [17] to derive upper bounds for the minimax risk. However, 
it should be noted that the additive error term £(x) significantly complicates the estimation 
procedure as the variance of £(x) clearly depends on the unknown function f and must therefore 
be estimated. Moreover, deriving lower bounds for the minimax risk in models such as (|1.4p is 
significantly more difficult than in the standard white noise model without the additive term 
£(*). 

Now let us formulate models (jl.lj) and (|1.4|) in the Fourier domain. Supposing that / G 
L 2 ([0, 1]), we denote by 0£ its Fourier coefficients for t G Z, namely 9^ = L e~ 2ll ' KX f [x)dx . The 
model (jl.lj) can then be rewritten as 

c m ,e := f e- 2ie ™dY m (x) = 9 e e~ M ™ + e m z £tm (1.5) 
J o 

= &ne + &,m + e m-^,mj 

with = 9ie~ l27TeTm — 9eje, and where zg^ m are iid Ac (0, 1) variables, i.e. complex Gaussian 
variables with zero mean and such that E|,z^ n | 2 = 1. 

Thus, we can compute the sample mean eg of the I th Fourier coefficient over the n curves as 

1 n 

eg = - Y] C£, m = Oili + e% = Oat + + et)l, (1-6) 

m=l 

with 7£ = TiE n m =ie- Mm , & = ££JU&,m, e ' = ££m=i4, and where the %'s are iid 
complex Gaussian variables with zero mean and such that IE 1 77^ | 2 = -. The average Fourier 
coefficients eg in equation (|1.6jl can thus be viewed as a set of observations which is very close to 
a sequence space formulation of a statistical inverse problem as described e.g by [6j. As in model 
(11.41) the additive error term ^g is asymptotically Gaussian, however its variance is i|#^| 2 (l — \^fi\ 2 ) 
which is obviously unknown as it depends on /, and thus it has to be estimated. 

If we assume that the density g of the random shifts is known, one can perform a deconvolution 
step by taking 

9 e = — = 9 e he—. (1.7) 

to estimate the Fourier coefficients of / since, for large n, ^is close to 1 by the strong law of 
large numbers. 

Based on the 9gs, two types of estimators are studied. The simplest one uses spectral cut-off 
with a cutting frequency depending on the smoothness assumptions on /, and is thus non- 
adaptive. The second estimator is based on wavelet thresholding and is shown to be adaptive 
using the procedure developed in [T7J. Note that all the theoretical results are presented for the 
case where the coefficients 7^ are known. Such a framework is commonly used in nonparametric 
regression and inverse problems to obtain consistency results and to study asymptotic rates of 
convergence, where it is generally supposed that the law of the additive error is Gaussian with 
zero mean and known variance e 2 , see e.g [17] [27], [6]. In model (jl.lj) . the random shifts may be 
viewed as a second source of noise and for the theoretical analysis of this problem the law of this 
other random noise is also supposed to be known. However, for some special classes of densities 
<?, it is possible to derive data-based estimator -yi of the 7/s, see Section [5l 



1.4 Previous work in mean pattern estimation 

The problem of estimating the common shape of a set of curves that differ only by a time 
transformation is usually referred to as the curve registration problem in statistics, and it has 
received a lot of attention over the last two decades, see e.g [10], [TT], [2], [30], [31], [21]. However, 
in these papers, an asymptotic study as the number of curves n grows to infinity is not considered. 
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In the simplest case of shifted curves, various approaches have been developed. Self-modelling 
regression methods proposed by [TH| are semiparametric models where each observed curve is a 
parametric transformation of a common regression function, and estimation of the shape function 
in such models is studied in |T8| with asymptotic in the number of curves. Based on a model 
with a fixed number n of curves, semiparametric estimation of deformation parameters and 
nonparametric estimation of the shape function is proposed in |22| and |33| . A generalization 
of this approach for the estimation of scaling, rotation and translation parameters for two- 
dimensional images is proposed in [4]. Estimation of a common shape for randomly shifted 
curves and asymptotic in n is also considered in [31J. There is also a huge literature in image 
analysis on mean pattern estimation, and some papers have recently addressed the problem of 
estimating the common shape of a set of similar images with asymptotic in the number of images, 
see e.g. [I], [23], [3] and references therein. 

However, in all the above cited papers rates of convergence and optimality of the proposed 
estimators are generally not studied, and adaptive estimation of the common shape function in 
such models has not been considered yet. 



1.5 Organization of the paper 

In Section [2l we consider a linear but non-adaptive estimator based on spectral cut-off. In 
Section [3j a nonlinear and adaptive estimator based on wavelet thresholding is studied, and 
upper bound for the minimax risk are studied over Besov balls. In Section HI we derive lower 
bounds for the minimax risk. In Section [5j it is explained how one can estimate the eigenvalues 
7^ for some specific class of densities g. Finally, in Section [6l some numerical examples are 
proposed to illustrate the performances of our approach and to compare them with another 
algorithm proposed in the literature. All proofs are deferred to a technical Appendix at the end 
of the paper. 



2 Linear estimation of the common shape and upper bounds for 
the risk for Sobolev balls 

2.1 Risk decomposition 

For i € Z, a linear estimator of the dgs is given by Q\ = where A = (Xe)eez is a sequence 

of nonrandom weights called a filter. An estimator f Ut x of / is then obtained via the inverse 
Fourier transform f n ,x{ x ) = J2e^z ^e~* 27rfe , and thanks to the Parseval's relation, the risk of this 
estimator is given by TZ(f n ,\, f) = ^Yle&z \®t- ~ ®i\ 2 ■ The problem is then to choose the sequence 
(Xe)eez in an optimal way. The following proposition gives the bias-variance decomposition of 

Proposition 2.1 For any given nonrandom filter X, the risk of the estimator f n \ can be decom- 
posed as 

nkx, f) = - i) 2 n 2 + - x 



n 



2 



|7<?I 2 J \le\ 2 



Bias Variance 



(2.1) 



Note that the decomposition (|2.1|) does not correspond exactly to the classical bias-variance 
decomposition for linear inverse problems. Indeed, the variance term in (12. lh differs from the 
classical expression of the variance for linear estimator in statistical inverse problems which would 

be in our notations e 2 X^ez ^ ence ' contrary to classical inverse problems, the variance term 
of the risk depends also on the Fourier coefficients Qi of the unknown function / to recover. 
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2.2 Linear estimation 



Let us introduce the following smoothness class of functions which can be identified with a 
periodic Sobolev ball 

H S (A) = J / € L 2 ([0, 1]) ; ]T(1 + \£\ 2s )\9 e \ 2 < a\ 
{ i& J 

for some constant A and some smoothness parameter s > 0, where Of = J e~ 2t£nx f(x)dx. Now 
consider a linear estimator obtained by spectral cut-off, i.e. for a projection filter of the form 
Xf 1 = 1|£|<m f° r some integer M. For an appropriate choice of M, the following proposition 
gives the asymptotic behavior of the risk 7£(/ n A iu, /). 

Proposition 2.2 Assume that f belongs to H S {A) for some real s > 1/2 and A > 0, and 

that g satisfies jil.l]) i.e. polynomial decay of the Je's. Then, if M = M n is chosen such that 
i 

M n ~ n 2s + 2 "+ 1 J then there exists a constant C not depending on n such that as n — > +oo 

sup K(f n!X M,f) < Cn"5i^+T. 
feH s (A) 

The above choice for M n depends on the smoothness s of the function / which is generally 
unknown in practice and such a spectral cut-off estimator is thus called non-adaptive. Moreover, 
the result is only suited for smooth functions since Sobolev balls H S (A) for s > 1/2 are not 
suited to model shape functions / which may have singularities such as points of discontinuity. 



3 Nonlinear estimation with Meyer wavelets and upper bounds 
for the risk for Besov balls 

The advantages of wavelet methods is their ability in estimating spatially inhomogeneous func- 
tions, and in recovering local features such as peaks or discontinuities. They can be used to 
estimate functions in Besov spaces with optimal rates of convergence, and have therefore re- 
ceived special attention in the literature over the last two decades, see e.g [9]. Wavelets have 
been successfully used for various inverse problems [8], and for the specific case of deconvolution 
a special class of band-limited wavelets introduced by [26] have recently received special attention 
in nonparametric regression, see [17J and |27J. 

3.1 Wavelet decomposition and the periodized Meyer wavelet basis 

This wavelet basis is derived through the periodization of the Meyer wavelet basis of L 2 (R). It 
is constructed from a scaling function (jf with Fourier transform 



<t>(cu) = / 4>*{x)e- iu)X dx 




\uj\ ^ 4vr/3, 
\uj\ > 47r/3, 

where h : C — > R is a smooth function (see pi], [E] for further details). In our simulations, we 
use for h a polynomial function of degree 3 to define the so-called Meyer window. Hence Meyer 
wavelets are band-limited functions which makes them very useful for deconvolution problems. 
Indeed, let (4>*,tp*) be the Meyer scaling and wavelet function respectively. Scaling and wavelet 
function at scale j > are defined by 

<f,* ik (x) = 2 j / 2 <f>*(2 j x - k) and ^* jjk {x) = 2 j ' 2 ^*{2 j x - k), k = 0, . . . , 2 j - 1. 
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As in [17| and [27], one can then define the periodized Meyer wavelet basis of L 2 ([0, 1]) by 
periodizing the functions (cfi*,ip*) i.e. for k = 0, . . . , 2° — 1 

</> jtk (x) = 2 ] l 2 ^ 4*(2 j (x + i)-k) and ipj, k (x) = 2 j ' 2 ^ ^*{2 j (x + i)-k). 

For any function / of L 2 ([0, 1]), its wavelet decomposition can be written as: 

230-1 +oo 2^-1 

/ = E c h,k<t>jo,k + E E ^J^j,k, 
k=0 j=jo k=0 

where Cj 0t k = J f(x)(pj 0i k(x)dx, f3j t k = Jq f(x),ipj,k(x)dx and jo > denotes the usual coarse 
level of resolution. Moreover, the squared norm of / is given by 

230-1 +oo 23-1 

ii/ii 2 = E c L + EE^ 

fc=0 j=jo k=0 

It is well known that Besov spaces can be characterized in terms of wavelet coefficients (see 
e.g P2]). Let s > denote the usual smoothness parameter, then for the Meyer wavelet ba- 
sis and for a Besov ball Bp (A) of radius A > with 1 < p, q < oo, one has that B*(A) = 

i - 1 

{/ e L 2 ([0, 1]) : (E^o" 1 KM P ) 3 + (Ego 2^+^-Vp) 9 ^ < a\ with the 

respective above sums replaced by maximum if p = oo or q = oo. 

Meyer wavelets can be used to efficiently compute the coefficients Cj^ and /3j } f. by using the 
Fourier transform. Indeed, thanks to the Plancherel's identity, one obtains that 

0j,k = E 

where ^' fe = ipj^(x)e~' l2nex dx denote the Fourier coefficients of if]j t k and fij = {£ G Z;-0|' fe ^ 
0}. As Meyer wavelets f/^fc are band-limited Oj is a finite subset set of [— 2 J+2 co, — 2 J co] U 
[2 J co, 2 J+2 co] with Co = 2tt/3 (see [17]). and fast algorithms for computing the above sum have 
been proposed by [19J and [29J. The coefficients Cj 0: k can be computed analogously with cf> instead 
of if) and Qj = {£ G Z; <^ 0,fc / 0} instead of IX,-. 

Hence, the noisy Fourier coefficients 9g given by (|1.7|) can be used to quickly compute the 
following empirical wavelet coefficients of / as 

fak = E 4 ,k h and d J ,k = E 4 o,k 0£- ( 3 - 2 ) 

3.2 Nonlinear estimation via hard-thresholding 

It is well known that adaptivity can be obtained by using nonlinear estimators based on appro- 
priate thresholding of the estimated wavelet coefficients (see e.g [9]) . A non-linear estimator by 
hard-thresholding is defined by 

230-1 ji 23-1 

fn = E £ 30,k<t>j ,k + E E &> kl mM^j,k}^ j ' k ^-^ 
k=0 j=jo k=0 

where the Aj^'s are appropriate thresholds (positive numbers), and j\ is the finest resolution 
level used for the estimation. As shown by [17] , for periodic deconvolution the choice for j± 
and the thresholds Xj^ typically depends on the degree v of ill-posedness of the problem. More 
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precisely, Theorem 1 in [17] asserts that upper bounds for the minimax risk can be obtained if 
there exists two constant Ci,Cz such that for all / G Bp q (A) (with appropriate assumptions on 
s,p,q,u) the following conditions are satisfied 



i 

n \ 2v + 1 



Vlog(n)y 

nfajk-PiJk\ A < ^( 2 ^^) 2 > (3-5) 



log(n) 



~ Pj,k\ > -\j ) < C 2 , (3.6) 



n 



2 



for all < k < 2 3 — 1 and jo < j < ji ; and where A, is an appropriate threshold that satisfies 



Xj < C2^J 2 -^, (3.7) 

V Tv 

where C is a sufficiently large constant not depending on n,j,k and /. 

To control moments of (3j ik , one needs the following assumption on the decay of the density 
g{x) as |x| — > +oo 



Assumption 3.1 There exists a constant C > and a real a > 1 such that the density g satisfies 
C 
i+N c 



g{x) < tttttcv /or a// 



Note that Assumption 13. II is not a very restrictive condition as g is supposed to be an integrable 
function on R. This can also be viewed as a sufficient condition to ensure the existence of the 
density G{x) introduced in (11.31) . 

Proposition 3.1 Let I < p < oo, I < q < <x>, s — l/p + 1/2 > and A > 0. Assume that g 
satisfies Assumptions \l.l\ and W^R Then, there exist positive constants C3 and C4 such that for 
any j > j > 0, < k < V - 1 and all f G B s m (A), E\c jo>k - c j(hk \ 2 < C 3 ^, E\0 jtk - fy, k \ 2 < 

C 3 *f, and E0 jjk - (5 hk \' <C 4 (^ + ^ 

Proposition 13.11 shows that the variance of the empirical wavelet coefficients is proportional 
to ^-j— which comes from the amplification of the noise by the inversion of the convolution 

operator. Moreover, it also shows that if 2 J1 = O ( log " TO ) ) 2 ' +1 , then condition (|3.5p is satisfied 
for all j <j< j\. 

Generally the choice of the threshold Xj tk is done by controlling the probability of deviation 
of the empirical wavelet coefficients /3j jk from the true wavelet coefficient j3j }k , which is given by 
the following proposition: 



Proposition 3.2 Let f G L 2 ([0, 1]), n > 1 and j > 0. Suppose that g satisfies Assumption \3.1 
For < k < T - 1 and9 e = jj f{x)e~ i2nlx dx, define 



a) = 2~>e 2 £ M" 2 , V} = £ and 5, = £ M. 

with \\g\\oo = sup xgR {g , (x)}. Let t > 0, then, 



2 



7/ hf\ 



14* " M > 2 max I cT,^, + ) ) < 2exp(-t). 
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Proposition 13.21 suggests to take level-dependent thresholds of the form 



A*, = A* = 2max | a jX / -L-^, \j J - + Sj^^ | (3. 



2?7log(n) 2rjVflog(n) 77 log (n) 
n ' V n J 3n 

where r\ > is constant whose choice has to be discussed. If 77 > 2, then Proposition 
implies that F (\$ jtk - /3 jik \ > A*) < 2n~ 2 < 2 , and thus the threshold A* satisfies 

condition (|3.6p . Moreover, Lemma [A. 21 (see the Appendix) states that the Fourier coefficients 
0£ are uniformly bounded for all / G Bp q (A) (provided s — 1/p + 1/2 > 0). Hence, since Qj 
is a finite subset set of [— 2- ?+2 co, — 2- ? co] U [2 J c , 2 J+2 co], one can easily check that there exists a 
constant C such that under Assumption ll.il 



A*<C2^^>, (3.9) 

for all / € .Bp^A). Hence, the threshold Ai- satisfies also condition (|3.7|) . 

The first term in the maximum (13.81) is the classical universal threshold with heteroscedastic 



variance er 2 which corresponds to an upper bound of the variance of the Gaussian term e X^en ■ 77 

in the expression of (3j^- However, the second term in the maximum (|3.8p depends on the modulus 
of the unknown Fourier coefficients 0£, and thus the thresholds A* cannot be used in practice. 
As already explained in the introduction, this comes from the fact that the variance of the error 
term ^ in model (|1.6p has to be estimated. 
First, remark that if / is such that 

\9 e \ 2 maxd^lloo, 1) < e 2 for all i G Oj and j > j , (3.10) 

then the threshold in (|3.8|) simplifies to 



^ = \* = 2 [ ^H2M!) + r * E | 7 , r « j (3 .„, 



since under condition (I3.10p V 2 < a 2 . Hence, for n sufficiently large A* rj 2<7j y 2,?1( ^ s ( n ) ) anc j 
thus one retrieves the classical form of a level dependent threshold for wavelet deconvolution in 
the white noise model as in [T7] . 

Obviously, an assumption such as (I3.10P would not hold for all / in some arbitrary Besov ball 
/ G Bp q (A), and one must find a data-based upper bound for \0e\ 2 - To do so, one can remark 
that an unbiased estimator of \9e\ 2 is given by 

2 1 n 

\e\ 2 e = - J2 1 wi 2 - e2 > ( 3 - 12 ) 

m=l 

where c m ^ = 0£e~ l2ireTm + ezi jTri , see equation (|1.5|) . This gives a very accurate estimation of \9e\ 
in the sense of the following lemma: 



Lemma 3.1 Let f G L 2 ([0, 1]), n > 1. T/ien, /or any j > and all £ G ft 




2i A^2 /2rj \ 

e< N < Vl^l/ +e 2 + eA/— > l-2exp(-t) 

n \ n j 



for any t > 0. 
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Based on Lemma 13.11 we define the following random thresholds Xj 




I / 27 ? log(n) 2V?r,\og(n) . v \og(n) t 
\j, fc = Aj- = 2 max | ^ y , y + ^ ^ | S./U3] 

with J)>2,Vf = WgW^-i Eeen, f$ and ^ = 2"^ £ fen . where 



N = vi^ 2 W 2l0g(n22 " 



The quantity |6^| is a data-based upper bound for \Qg\ that is plugged into the definition of the 
ideal threshold A* to obtain the data-driven thresholds Xj defined by (|3.13p . Note that such 
thresholds are not difficult to calculate in practice, and more details on their computation are 
given in Section [6] on numerical experiments. Then, the following proposition holds: 

Proposition 3.3 Let f £ L 2 ([0, 1]) and n > 1. Then, there exists a constant C not depending 
on f and n, such that for all j > and < k < 2 3 — 1 



(l&k - &,fcl >\j) <Cn 



2 



Combining the above arguments, we finally arrive at the following theorem which gives an 
upper bound for the minimax risk over a large class of Besov balls. 

Theorem 3.1 Assume that g satisfies Assumptions li.il and \3.1\ Let j\ and jo be the largest 

i 

integers such that < (r^J ^ and 2™ < log(n). Let /£ be the non-linear estimator 
obtained by hard-thresholding with the above choice for j% and jo, and using the thresholds Xj 
defined by equation \3.13\) . Let 1 < p < oo, 1 < q < oo and A > 0. Let p' = min(2,p), 
s' = s + 1/2 — 1/p, and assume that s > 1/p' . 

Ifs> (2i/ + l)(l/p-l/2), then 



sup ||/^-/|| 2 = C'fn"OT+T(logn) /3 ) with 



2s 



feB * >q (A) V J 2s + 2i/+l 

Ifs< (2i/ + l)(l/p-l/2), then 



sup \\fH-ff = 



log(ra) 



In standard periodic deconvolution in the white noise model (see e.g. [ 1 T J ) , there exists two 
different upper bounds for the minimax rate which are usually referred to as the dense case 
(s > (2v + l)(l/p— 1/2)) when the hardest functions to estimate are spread uniformly over [0, 1], 
and the sparse case (s < (2v + l)(l/p — 1/2)) when the worst functions to estimate have only 
one non- vanishing wavelet coefficient. Theorem 13.11 shows that a similar phenomenon holds for 
the model (jl.ip . and to the best of our knowledge, this is a new result. 



In the following section, the rate n 2 S +2^+i) j s shown to be, in the dense case, a lower bound 
for the asymptotic decay of the minimax risk lZ n (Bp q (A)) . Hence, when the the hardest functions 
to estimate are spread uniformly over [0, 1], the estimator f£ is near-optimal up to a logarithmic 
term. This phenomena is well known in nonparametric regression with wavelets, see e.g [9], 
[27], and this extra logarithmic factor is usually referred to as the price for adaptivity. Here, 
adaptivity must be understood in the sense that the proposed threshold and the choice of the 
finest resolution level j\ does not depend on the smoothness parameter s. Recall that the space 
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Bp q (A)) with 1 ^ p < 2 contains piecewise smooth functions with local irregularities such as 
peaks or discontinuities, while the space U| 2 (A) is a Sobolev ball if s is not an integer. Hence, 
contrary to the linear estimator proposed in Section [21 the above described non-linear estimator 
is suitable for the estimation of irregular functions, and is adaptive with respect to the unknown 
smoothness parameter s. 

4 Minimax lower bound 

The following theorem gives an asymptotic lower bound on the minimax risk 1Z n (Bp q (A)) for a 
large class of Besov balls. 

Theorem 4.1 Let 1 < p < oo, 1 < q < oo and A > 0. Suppose that g satisfies Assumption \l.l\ 
Let p' = min(2,p). Assume that s > 1/p' and v > 1/2. 

If s > {2v + l)(l/p — 1/2) and s >2v + l (dense case), there exits a constant M\ depending only 
on A,s,p,q such that 

TZ n {B s pq {A)) > Mxn" W^l+i asn^ +oo 

In the dense case, the hardest functions to estimate are spread uniformly over the interval 
[0, 1], and the proof is based on an adaptation of Assouad's cube technique (see e.g Lemma 10.2 in 
|14j ) to the specific setting of model (jl.ip . Lower bounds for minimax risk are classically derived 
by controlling the probability for the likelihood ratio (in the statistical model of interested) of 
being strictly greater than some constant uniformly over an appropriate set of test functions. To 
derive Theorem 14. II . we show that one needs to control the expectation over the random shifts of 
the likelihood ratio associated to model (II. ip . and not only the likelihood ratio itself. Hence, the 
proof of Theorem 14. II is not a direct and straightforward adaptation of Assouad's cube technique 
or Lemma 10.1 in [14] as used classically in a standard white noise model to derive minimax 
risk in nonparametric deconvolution in the dense case. For more details, we refer to the proof of 
Theorem 14.11 in the Appendix. 

Deriving minimax risk in the dense case for the model (jl.ip is rather difficult and the proof 
is quite long and technical. In the sparse case, finding lower bounds for the minimax rate is also 
a difficult task. We believe that this could be done by adapting to model (jl.ip a result by [20] 
which yields a lower bound for a specific problem of distinguishing between a finite number of 
hypotheses (see Lemma 10.1 in [H]). However, this is far beyond the scope of this paper and we 
leave this problem open for future wok. 

5 Estimation of the eigenvalues 7^ 

Define the following specific class of densities 

T+ = \ g : R -> R + , / g{x)dx = 1, with je = e _i27rfo ' g{x)dx = ~ t and 

7 £ > for all I £ Z} . 

Examples of densities belonging to J r + include the Gaussian or Laplace distribution with zero 
mean. 

Next we give a serie of arguments to explain how one can estimate the 7/s if g G !F + . 
First recall that the quantity ^Ylm=i\ c m,e\ 2 — ^El=i f i 1S an unbiased estimator of \9e\ 2 - 
Similarly, one can remark that |q| is an unbiased estimator of |7^| 2 |6^| 2 , where by definition 
C£ = h c mj £. Therefore, this suggests that |7^| 2 can be estimated by 



I V n \r A 2 - i V n f 2 ' 
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and if one assumes that g £ F + then an estimator for 7^ is given by j£ = y/dg. In such a case, 
one has immediately the following result by applying the strong law of large numbers: 

Lemma 5.1 Assume that g G J-+ and that 9g 7^ for all £ G Z. Moreover suppose that e 2 ^ is 
known for all m = 1, . . . , n. Then, as n — > +00 7^ — ► 7^ almost surely. 

The estimated value 7^ can then be plugged into the definition of the wavelet estimator and 
the thresholds given in Section [3j 

6 Numerical experiments 

We compare our approach with the Procrustean mean which is a standard algorithm commonly 
used to extract a mean pattern. The Procrustean mean is based on an alternative scheme between 
estimation of the shifts and averaging of back-transformed curves given estimated values of the 
shifts parameters, see e.g [34], [18]. To be more precise it consists of an initialization step 
/o = h Em=i which is the simple average of the observed curves that is taken as a first 
reference mean pattern. Then, at iteration 1 < i < i max , it computes for all 1 < m < n 
an estimation f m> i of the m-th shift as T m ,i = argmin Tg iR \\Y m (- + t) — /i-i|| 2 and then takes 
fi(x) = ^^m=i^m( x + ?m,i) as a new reference mean pattern. This is repeated until the 
estimated reference curve does not change, and usually the algorithm converges in a few steps 
(we took i max = 3). In all simulations, we used the wavelet toolbox Wavelab [5] and the WaveD 
algorithm developed by [29] for fast deconvolution with Meyer wavelets. 

6.1 Estimation when the 7/s are unknown 

For the mean pattern / to recover, we consider the four tests functions shown in Figures Qli-|4k- 
Then, we simulate n = 200 randomly shifted curves with shifts following a Laplace distribution 
g(x) = -J=£ exp ^— \/2^pj with a = 0.1. Gaussian noise with a moderate variance (different to 

that used in the Laplace distribution) is then added to each curve. A subsample of 10 curves is 
shown in Figures [lb- Hb for each test function, and the average of the observed curves, referred 
to as the direct mean in what follows, is displayed in Figure£D> Hb. Note this gives a poor 
estimation of the mean pattern. 

The Fourier coefficients of the density g are given by 7^ = 1+2 J i Tr i e i wmcn corresponds 
to a degree of ill-posedness v = 2. Hence, this density belongs to the class F + defined by 
(j5.ll) . and thus an estimate of 7^ can be performed as explained in Section After de- 
convolution of the direct mean, Theorem 13.11 suggests to take a threshold of the form Xj = 

2 max (vj\J~ ^°n^ ; \l ^ j V n S ~ L ~^ + $j ~ J ■ our simulations we have found that choosing 

77 between 1 and 2 gives quite satisfactory results. However, some quantities in the definition of 
Xj have to estimated. The following numerical experiments are thus given for the threshold 

. „ / /21og(n) 1 2V 2 log{n) log(n)\ 
A, = 2max ^ + j 

with 

where Oj = 2 _:, e 2 ^^ g Q. |7£| _2 with e 2 = ^JJ, = i4) an d where the variance of the noise 
for the m-th curve is easily estimated using the wavelet coefficients at the finest resolution level. 
Note that this threshold is quite simple to compute using the Fast Fourier Transform to calculate 
the |^|'s, and the fact that the set of frequencies Qj can be easily obtained using WaveLab. 
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Then, we took j = 3 rs log 2 (log(n)), but the choice j x ~ 2U+1 lo ^2(\E§^) is obviously too 
small. So in our simulations, j\ is chosen to be the maximum resolution level allowed by the 
discretization i.e. j\ = log 2 (-/V) — 1 = 7. The modulus of the empirical wavelet coefficients 
together with the value of the threshold Xj are shown in Figures [TJi -|4ji for each test function, 
and the resulting estimators are displayed in Figures [It -Hk. The Procrustean mean is displayed 
is Figures QJ -0J, and one can see that both methods give similar and quite satisfactory results. In 
particular, it should be noted that these results are obtained in the case of an unknown density 
g which confirms the quality of the estimation procedure proposed in Section [5] for the 7/s. For 
reasons of space a detailed simulation study is not given, but it has been found that the good 
performances of the wavelet-based estimator remain consistent across other standard test signals. 




(a) (b) (c) 



- n 




(d) (e) (f) 

Figure 1: Wave function, (a) Mean pattern /, (b) Sample of 10 curves out of n = 200, (c) Direct 
mean, (d) Empirical wavelet coefficients after deconvolution (ordered for small to large resolution 
levels) and value of the threshold, (e) Deconvolution by wavelet thresholding, (f) Procrustean 
mean 




(a) (b) (c) 




(d) (e) (f) 

Figure 2: HeaviSine function, (a) Mean pattern /, (b) Sample of 10 curves out of n = 200, 
(c) Direct mean, (d) Empirical wavelet coefficients after deconvolution (ordered for small to 
large resolution levels) and value of the threshold, (e) Deconvolution by wavelet thresholding, (f) 
Procrustean mean 

6.2 Estimation when the 7/s are known 

Now, consider the step function shown in Figure and let us take for g the density displayed 
in Figure [5b that is a mixture of two Laplace distributions with different means. Estimation by 
wavelet thresholding is quite satisfactory expect the pseudo-gibs phenomena at the discontinuity 
points of /, which could be reduced by using a translation invariant wavelet decomposition. 



13 



(a) 





(d) 



(e) 



(f) 



Figure 3: Blocks function, (a) Mean pattern /, (b) Sample of 10 curves out of n = 200, (c) Direct 
mean, (d) Empirical wavelet coefficients after deconvolution (ordered for small to large resolution 
levels) and value of the threshold, (e) Deconvolution by wavelet thresholding, (f) Procrustean 



mean 




Figure 4: Doppler function, (a) Mean pattern /, (b) Sample of 10 curves out of n = 200, (c) 
Direct mean, (d) Empirical wavelet coefficients after deconvolution (ordered for small to large 
resolution levels) and value of the threshold, (e) Deconvolution by wavelet thresholding, (f) 
Procrustean mean 



The Procrustean mean is not very satisfactory as it is composed of two step functions (for a 
fair comparison the estimated shifts at iteration i max = 3 have been centered at the true mean 
of the density g). Note that the iterative procedure used to compute the Procrustean mean is 
initialized with the direct mean /o which is a very poor estimation for / as it is the sum of 
two step functions with different amplitude. Hence, this makes the Procrustean mean a non- 
consistent procedure for the estimation of /, which somewhat illustrates the crucial dependence 
on the initialization step of such an iterative method. 

7 Conclusions and future work 

This paper makes a connection between mean pattern estimation and the statistical analysis of 
inverse problems for a very simple model with shifted curves. A natural extension would be to 
consider more complex deformations such as the homothetic shifted regression model proposed 
in [33], or the rigid deformation model for images considered in [3]. 

Another promising approach to estimate a mean pattern would be to take a Bayesian point 
of view, and then to estimate the common shape via the EM algorithm where the unobserved 
random shifts are considered as hidden variables. Such an approach has been proposed in p] 
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(a) 



(d) 



(b) 

i, — 4 
(e) 



(c) 



(f) 



Figure 5: Step function, (a) Mean pattern /, (b) Density g of the shifts, (c) Direct mean, 
(d) Empirical wavelet coefficients ordered for small to large resolution levels, and value of the 
threshold, (e) Deconvolution by wavelet thresholding, (f) Procrustean mean centered at the true 
mean of the shifts 

to compute the mean pattern of a set of images. However the approach followed in [T| has not 
been shown to be consistent from a nonparametric point of view. Moreover, optimality of such 
a Bayesian procedure remains to be studied. 

A Appendix section 

In what follows C will denote a generic constant whose value may change from line to line. 
Proof of Theorem ll.lt it follows immediately from Theorem 13.11 and Theorem 14.11 □ 

) for all f£Z. 



Proof of Proposition ED let n e = - lj 9 e and e*, n = f ( (± £™ =1 z t 

Then, for a given filter A, the risk TZ(fn : \, /) can be written as 



eel 



+Xe(Xe - l)E[9£€£ >n + 6eee,n] + A 2 E[/«£e^ n + K£€£ >n ]. 
Now using the fact that K£ and ti^ n are independent and that Ke£ in = 0, we obtain that 



R(fn,X, f) 



(X e - l) 2 \9 e \ 2 + X 2 e \9 e \ 2 E 



E 

kez 

I>-i) 2 N 2 



it 
it 



2J1 



+ 



ill 

n\lt\ 



eel 



1 n- 1 

- H ia-t 



n 



n 



2J2 



n he\ 



\ 2 
n 



nt\ 



i + 



= £(A,-l) 2 N 2 + £ 

leTL leTL 

which completes the proof. 

Proof of Proposition 12.21 from Proposition 12.11 it follows that 



□ 



\t\>M n 



\<M n 



nt\ 
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By assumption / G H S (A), which implies that there exists two positive constants C\ and C 2 
not depending on / and n such that for all sufficiently large n, Yl\i\>M n \@l\ 2 — C\M~ 2s and 
n ^2\t\<M n l^l 2 — C'2n~ 1 . Now, given that g satisfies Assumption (|l.ip . it follows that there 
exists a positive constants C3 not depending on / and n such that for all sufficiently large 

n i n Y2\e\<M n ^fap 6 — C^n~ l M 2u+l . Hence the result immediately follows from the choice 
1 

M n ~ n 2s+2v+1 , which completes the proof. □ 
For the proof of Propositions 13.11 and 13.21 let us remark that that j3j : k — Pj,k = Z\ + Z 2 with 

Zl= V# fc ^-l) and Z 2 = ey2—Vt- (A.l) 

Under Assumption 13.11 G{x) = ^2 m& z9( x + m )-> exists for all x G [0, 1] and is a bounded 
density. Throughout the proof we use the following lemma whose proof is straightforward: 

Lemma A.l Let h G L 2 ([0, 1]) be a 1 -periodic function onl. Then, J R h(x)g(x)dx = J h(x)G(x)dx. 



Proof of Proposition SIU first note that since \tp{ \ < 2" j/2 and % C [-2^' +2 c , -2?cq] U 



[2^ c , 2^ +2 c ], see [T7j, it follows that #{%} < 4tt2^ and that under AssumptiondU ^l -2 ~ 2 2 ^ 
for all I G Oj. This implies that there exists a constant C > such that 

y|^i!|2< C 2 2 ^ and V |^| < C72^ +1 / 2 ). (A . 2) 

Then, we need the following lemma which shows that the Fourier coefficients 6g = e~ 2linx f(x)dx 
are uniformly bounded for all / G LVLJA). 

Lemma A. 2 Let 1 < p < 00, 1 < q < 00, s — 1/p + 1/2 > and A > 0. Then, there exists a 
constant A' > such that for all f G B pq {A) and all IsZ, \9e\ < A'. 

Proof : since \($» k \ <2~i°/ 2 and|^f' fc | < T ~ 3 ' one can remark using Cauchy-Schwarz inequality 
that for any jo > 

2J0-1 +00 2J-1 

< E i9*,,*n^'*i + EE 1^11^*1 

j=jo fc=0 




2^0-1 \ V 2 +OC) / 2 i_i \ - 1 / 2 

< ( E i^ , fc | 2 + E E i^i 2 

fc=0 / 3=30 \k=0 

Now using the inequality (YZ=i |a r | 2 ) 1/2 < m W^M+ (£™ = 1 \a r \P) 1/p for £ p -norm in M m it 
follows that < 2MV*-i/ P ) + ~( E 2 ^ \ Cjo ^ Vp + 2*1/2-1/*)+ (5^1 , ^ 1/p . 

Since / G B s p>q {A), one has that {ZkJo IW)^ < A2^( s+1 /2-i/p) and (^g^ 1 \c jo , k \ p ) 1/P < 
A which implies that \9 e \ < A2^ 1 / 2 ~ 1 M+ + A T^= jo 2-J'( s + 1 /2-i/p-(i/2-i/ p ) + )_ Taking for in- 
stance jo = completes the proof since by assumption s + 1/2 — 1/p > 0. □ 

Control of E|/3j t fc — Pj,k\ 2 (the proof to control E\cj .k — Cj .k\ 2 follows from the same arguments): 
from the decomposition (tAll it follows that E|/9,- )fe - p j>k \ 2 < 2E\Zi\ 2 + 2E\Z 2 \ 2 . Since rji are iid 
Ac (0, 1/ra), the bound (IA.2[) implies that 

02jV 

E|Z 2 | 2 < C . (A.3) 

n 
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Then, let us write Z x = \ YZ =1 (W m -EW m ) with W m = hj, k (r m ) and h jlk (r) = ^e~ M 

for rel. By independence of the r m 's, one has that E|Zi| 2 < ^E|P^i| 2 . Applying Lemma fA. II 
with h = hj k and since the density G is bounded, it follows that 

El^il 2 = f \h j)k {T)\ 2 g(r)dT= f \h Jtk (r)\ 2 G(T)dT 
JR JO 

where the last inequality follows from Parseval's relation. Then, using the bound (|A.2j) and 
Lemma fA. 21 inequality flA.4|) implies that there exists a constant C such that for all / E Bp J A) 



EZ 



n t — ✓ yytY" n 



Hence using the bounds (|A.3|) and (|A.5|) . it follows that there exists a constant C such that for 
all / e S- E|4 fc - /J,- fc | 2 < C 2 ^. 

Control of El/3^ — /3j-&| 4 : from the decomposition (lA.lj) it follows that ^<\(3j tk — /?j,fc| 4 < 

I j,k 

C(E|Zi| 4 + E | ^2 1 4 ) ■ As Z2 is a centered Gaussian variable with variance ^e 2 X^e£i, I^y7~| 2 — 
C 2 -^, one has that 

E|Z 2 | 4 < C^-. (A.6) 
rr 

Then, remark that Zi = i^Li^ with F m = ^( e -» m _ ^ and recall the 

so-called Rosenthal's inequality for moment bounds of iid variables [32]: if Xi, . . . ,X n are iid 
random variables such that EXj = 0, EX 2 ^ cr 2 , there exists a positive constant C such that 
E|Ei=i*iH 4 < CV 4 /n 2 + E|X 1 | 4 /n 3 ). 

Now remark that EY m = 0, and arguing as previously for the control of E|VFi| 2 , see equation 
(IA.4p . it follows that E|Y m | 2 < C2 2jV where C is constant not depending on /. Then, remark 
that 



E|n| 4 <c(f \h jM (T)\ A g{ T )dT + \(3 jk A with h j>k (r) = £ 



and that 

/ \h j:k (T)\ 4 g(T)dT < su P {\h jik (r)\ 2 } [ \h j;k (r)\ 2 g(T)dr 

JR tGR JR 

Note that using (TOjl and LemmalU it follows that \h jik {r)\ < Y.eeQ, H^f^ ^ C'E/efy n£r 
< C2^ !y+1 / 2 ) uniformly for / E Then, arguing as for the control of E|Wi| , see equation 

(pOll . one has that / R \h^ k ( T )\ 2 g{r)dT < C2 2 i u , which finally implies that E^l 4 < C2^ +1 ), 
since \(3j k \ < C uniformly for / E .B* (-A). Then, using Rosenthal's inequality, it follows that 
there exists a constant C such that for all / E Bt, q {A) 

9j4f nj'(4i/+l) 

E|Z 1 | 4 <C7(— + — ), (A.7) 

which completes the proof for the control of E|/3j >k — f3j^ k \ A using (|A.6I) and (IA.7|) . □ 
Proof of Proposition 13.21 let u > 0, and remark that from the decomposition t) A. 1[) it follows 

P - >«) <F(l^i| >n/2)+P(|Z 2 | >u/2) 
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Recall that the %'s are iid Nq (0, 1/n). Hence, Z2 is a centered Gaussian variable with variance 

1 j,k 

h 2 Eten* l-^l 2 < k a h with °i = 2 ~ ie2 ^tecij M~ 2 i which implies that (see e.g. [25]) for any 
t > 



mZ 2 \ > °jJ^) < 2exp(-t) (A.f 



By definition, 7, = A £™ =1 e -^r m ( and thug z± = lYZ=i(W m - EW m ) with W m = 
SfeOj 7 £ " e~ l27T£Tm . Remark that W m are random variables bounded by Sj = 2~ J / 2 Yleeiij w- 
Moreover, using Lemma fA. II with h = hj^ij) = X^efi- ^\ ~ e~ t27r ^ T for r G R it follows that 



•3 7* 

3>*|2|fl„|2 



E|^i| 2 = / |^ fc (r)| 2 5 (r)dr < ||G|U £ ^ 1 jf' 1 < V* 

where V^ 2 = ||<7|| 00 2 _J ' YleeCj YuF' smce l^'^l — 2~ 3//2 and ||g||oo = ||C||oo- Hence, from Bern- 
stein's inequality it follows that for any t > (see e.g Proposition 2.9 in [25J) 



|^i|>y^ + ^^| <2exp(-t). (A.9) 



Taking u = 2 max { Cj-v/—, V ~~ « — ^ ^ '3n ) f° r * > concludes the proof of Proposition 13.21 □ 



Proof of LemmaO let Z = \J \0\ e + e 2 = J± ^^=1 l<wl 2 and reca11 that c m,^ = 6»£e" i2rfTm + 
ez£ >m . As the zi )m 's are iid standard complex Gaussian variables, by conditioning with respect 
to T±, . . . ,T n and by using a concentration inequality for Lipschitz functions of random variables 
(see e.g. |25J), one has that for any t > 

F^Z >EZ + £\J^Pj < exp(-i) and P (^Z < EZ - £\f^\ < exp(-t). (A.10) 

By Jensen's inequality and using the concavity of the function x 1— ► ^fx , one has that 

/ „ \ 1/2 

EZ ^ [- E E i<wi 2 = (n 2 + g2 ) 1/2 < m + e - 

Hence, the above inequality and flA. 10[) imply that P [z > \Q^\ + e + e y / ^) — exp(— t). Using 
the concavity of the function x 1— > and then the convexity of the function ih |x| it follows 



that EZ = E^J2l=i\cm,i\ 2 > 7 t YZ =i nc m A > ~El=i\®Cm,l\ = N- Hence, EZ > \9 e \ 

and thus using (1A.10|) . this implies that P [z < \6e\ — e \j~^j — ex P( — which completes the 
proof of Lemma 13.11 □ 

Proof of Proposition 13.31 from Lemma I3TT1 and the definition of |^|, it immediately follows 
that for any I 6 Qj 

P(|^| > N) < C2~ J n- 2 . (A.ll) 
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Now remark that by definition of Xj and A* 

f (\k,k - Pj,h\ > Aj) = p - > A i; w e fiji 1^1 < l^l) 

+P - p jtk \ > Xj-, 3£ E fy, |^| > |^|) 

< P (|4- fc - fc | > A*) + ^ P > |^|) 

< C (n- 2 + #{%}2 _3 n _2 ) < Cn~ 2 , 

where the last inequality follows from Proposition 13.21 inequality (|A.llj) and the fact that 
#{%} < C2 J ', which completes the proof of Proposition 13.31 □ 

A. 1 Proof of Theorem ETH 

Under the assumptions of Theorem 13.11 one can remark that Proposition 13.11 and Proposition 
13.31 imply that the conditions (|3.4|) . (|3.5p and (|3.6I) are satisfied. Consider the event A = {W E 

fy, \0t\ < \e e \ + e + 2 21ogi f 2J) }. From the bound ((All), and the definit ion of Aj and A* , one can 
easily check that on the event A 

Xj < 

where C is a uniform constant not depending on /. Using Lemma 13.11 one has that F(A) > 
1 — re -2 , and thus it follows from the above inequality that the random threshold Xj satisfies the 
condition (|3.7|) except on a set of probability less than n~ 2 . Hence the results follows using the 
arguments in the proof of Theorem 1 in |17| . □ 

A.2 Proof of Theorem IP1 

Let us fix a resolution j > whose choice will be discussed later on, and consider for any 
V = (Vi)ie{o...23-i} G {±!} 2 ' the function f jiV defined as f jtV = Ya=q 1 Vk4>j,k, where 7, = 
c 2~j( s + 1 / 2 ) ) an d c is a positive constant satisfying c < A which implies that fj tV E Bp q (A). For 
some < i < 2 3 ' — 1 and rj E {±1} 2J , define also the vector rf E {±1} 2J with components equal 
to those of rj except the i th one. 

Let ipj^k + diz) = fm^j,k( x ~ u )9(, u )du. By Parseval's relation, one has that \\ipjk * 9\\ 2 = 
Y^eeU- | 2 |7^| 2 - Hence, under Assumption II. II of a polynomial decay for 7^ and using the fact 
that \t(^> k \ < 2~H 2 for Meyer wavelets (see [T7]) it follows that there exists a constant C such 
that W^k + gf < C2~ 2 i v . 



A. 2.1 Algebraic settings 

We set the resolution j = j(n) to be the largest integer satisfying 2 3 ^ < n 2a + 2u + 1 . However, to 
simplify the presentation, the dependency of j on n is dropped in what follows. The definition 
of fj iV , 7j and the bound \\ipj t k * d\\ 2 < C2~ 2 i v thus imply that 



3+1/2 

7j = 0(n ) an d \\f jtr) \\ z = 0(n 



1 1 fj,v * 9 1 1 2 = ||7j Yl % W>M * 9)\\ 2 = O (n a.+a-+i 
fc 

Il(/i,i7-/^«)*sl| 2 = ||27 J r/ l (fe*5)H 2 = 0( 7 2 2^) =0{n- 1 ). 

From the above equations, we can thus conclude that n\\(fj tT1 — fj^i)*g\\ 2 = O (1) , but note that 
the term n\\fj iTI * g\\ 2 does not converge to 0. At last, observe that by assumption s > 2v + 1 
which implies that n\\f jiV -kg\\ 3 -> 0, n||(/j )7) - /^i) * g\\ \\fj lV \\ \\fj >v *g\\ -»■ and n||/^|| 3 -> 0. 
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A. 2. 2 Likelihood ratio 

Let F(Y) be real valued and bounded measurable function of the n trajectories Y = (Yi, . . . , Y n ). 
Because of the independence of the Tj's and the Wj's, we have that 

E f [F(Y)) = [ E f , w [F(Y)\T 1 =t 1 ,...,T n = t n ]g(t 1 )dt 1 ...g(t n )dt n , 

where Ej denotes the expectation with respect to the law of Y = (Yi, . . . , Y n ) when / is the true 
hypothesis, and E/,w is used to denote expectation only with respect to law of the Brownian 
motions W\,...,W n where the shifts are fixed and / is the true hypothesis. Now using the 
classical Girsanov formula it follows that for any function h £ L 2 ([0, 1]) 

E f [F(Y)] = f E h>w [F(Y)\T 1 =t 1 ,...,T n = t n }A n (f,h)g(t 1 )dt 1 ...g(t n )dt n 
= E h [F(Y)A n (f, h)] 

where A n (f,h) is the following likelihood ratio 

An(/, h) = fjexp (J\f(* ~ n) - h(x - n))dTi(x) + \\\hf - i||/|| 2 ) . 
In what follows, /o is used to denote the hypothesis / = 0. 



A. 2. 3 Technical Lemmas 

Given n arbitrary trajectories Y\,...Y n from model we define E T (A n (/ J „<, /o)) as the 

expectation of the likelihood ratio with respect to the law of the random shifts, namely 

\f{x - 74) - h(x - n))dTi(x) + \\\h\\ 2 - \\ 



E r (A„(/^,/ )) = / JJeJo 



i=l 

gin) . ..g{r n )dTi . ..dr n . 

Lemma A. 3 Suppose for some constants A > and ttq > and all sufficiently large n we have 
that 

/ E r (A n (4, M / )) \ 

p ''4ma*(Wo)) " J" °' ( ] 

for all fj iV and all i G {0 . . . 2 J — 1}. Then, there exists a positive constant C , such that for all 
sufficiently large n and any estimator f n 



max E/. ||/ n - fjj' > Ciroe-^^ 
v e{±i} 2J 

Proof of Lemma IA.3I : our proof is inspired by the proof of Lemma 2.10 in [H|- For this let 
Ijk = [Jj-, and arguing as in plj it follows that for any estimator f n 



23-1 



|/n fj,v\ 



+A n (fj^k , fj,ri) / |/n I 
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Let Z(Y) = \J Ijk \f n - f m \ 2 + A,, I /,,,,../;,,) J Iik \f n - f m u 



j,k 



and remark that 



E 



fo,W 



An(/j-,r?) /o) / \fn ~ fj 



3,k 



+A n (/ J)t) fc,/ ) / | f n - fj iV k | 



g{Ti)dTi . ..g(T n )dr n . 



Now, since under the hypothesis fo the trajectories Y\, . . . ,Y n do not depend on the random 
shifts n, . . . ,T n it follows that /„ does not depend on the shifts t±, . . . , r n as it is by definition 
a measurable function with respect to the sigma algebra generated by Y\, . . . ,Y n . This implies 
that for any 5 > 



E f .[Z(Y)] = E f0iW 



r {Ki{fj,r\i fo)) / \fn — fj,n 

+E T (A n (/ ? .^,/ )) / \f n -f hV A 2 



> E 



fo,W 



E T (A n (f,,vJo))S 2 t 
E T (A n (f jjTlk Jo))S 2 t 



{//,,J/»-/^l 2 > 52 } 



Now, remark that 



1/2 



I fn fj,r 



+ 



1/2 



I fn fj,r] k I 



> 



1/2 



\fj,V fj, 



3,k 



> 2 7j |Vi*| 5 



1/2 



and let us argue as in the proof of Lemma 2 in [35] to a find a lower bound for fj ^ \ipjk\ 2 - 

By definition, see Section [3j iftj^fx) = 2 J ^ 2 X^iez + i) — fc) where ip* is the Meyer 

wavelet over R used to construct V- A change of variable shows that A- |^-jfc(a;)| 2 da; = 

Jo ip*(% + 2 3 'i)| dx which implies that Jj. fc |^-fe(a;)| 2 da; > J dx— ^j e z* Jo \ip*{x + 2H)\ dx. 

Now as f/j* has a fast decay, it follows that there exists a constant A > such that |?/>*(x)| < yq^. 
Thus, fj \tpjk(x)\ 2 dx > /J |V'*(x)| 2 dx — A 2 2~ 2 i Y^iei* i " 2 - Hence, it follows that there exists a 

constant p > such that ( J 7 fc IV'jfcl 2 ) — p for any fc, and all j sufficiently large. 
Hence if one takes 5 = 2p r yj it follows that 



which yields 



E/,,jZ(y)] > 5 2 E /o , w 



E 



/* ^ • /\ E r (An(/j,^,/o)) ' 



<5 2 E 



fit 



An(/j lJ? ,/o)min 



' E T (An^./p)) 

' E r (A n (/ j>7? ,/ )) 



5 2 E 



mm 



' E r (A n (f jtV k,f )y 
' E T (A n (/ jir7 ,/ )) 
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and arguing as in the proof of Lemma 2.10 in [14] completes the proof. 



□ 



Now remark that under the hypothesis / = fj iTj , then each Yi is given by dYi(x) = fj jV (x — ai)dx + 
dWi(x) where each an is the true random shift of the i th trajectory. Thus, under this hypothesis, 
we obtain E r (A n (/,-, 1 / )) = 11"=! Jr 9{n)^ ^-^f^-^+f^-n )dW i (x)-i\\f j , v p] dT ^ and 

E r =nr=i/ R s(^ 

two expressions above, we now study the condition (|A.12j) . 

Lemma A. 4 Following the choices of j(n) and 7j( n ) given in our algebraic setting, there exists 
A > and ttq > such that for all sufficiently large n 

Proof of Lemma IA.4I : to obtain the required bound, we use several second order Taylor 
expansions. From the Cauchy-Schwarz inequality, we have 



eJO 



1+ / fj;q{x-Ti)f j)rt {x-ai)dx + Op{\\f jtr 



A similar argument yields E 



Jo hv( x - Ti)dWi{x) 



< 



and the Markov inequality used 



f jiV (x - Ti)dWi(x) 1 
with a second order expansion implies e-'o = 1 + j Q fj >rj (x — Ti)dWi(x) + 

l 1 2 

Jo fj>v( x ~ T i)dWi(x) +Op(||/j,r?|| 3 )- Looking now at the complete expression of E T (A n (/ 7;r/ , /o)), 



we obtain E T (A n (f jtV , f )) = LJILi f R g(n) 1 + Jq fj, v (x - n)f jin {x - a>i)dx + J Q f jtJ1 (x - Ti)dWi(z 



+ 



. The Fubini-type theorem for stochastic 



£ f hV { x - TjdWiix^ - IWfjj 2 + Opdi/^i 

integrals (see for instance [15j . chapter 3, lemma 4.1) enables to write logE T (A n (fj tV , /o)) 
E"=i lo § 1 + Jo (/?',»? * g)( x )fj,n( x -(*i)dx 

i r i i 2 

+ Jo (/?>?* 5)0)dW;(:c)± / R [J* / i)f ,(x - T i )dH / i (x)J y^)^ - |||/,^|| 2 

+o P (\\f j J 3 )} 

Then, applying a classical expansion of the logarithm log(l + z) = z — 4^ + 0(z 3 ), we obtain 



logE T A n (/ ji7? , f ) = z -- + O p (z 3 



f j>v (x - Ti)dWi(x) 



+ O p {n\\kvt)- 



(A.13) 
(A.14) 



= (fj,v*9)( x )fj,v( x - ai)dx+ (f jtV *g)(x)dWi( 

i=1 Jo Jo 

1 - ( f 1 f 1 

jV (x — ai)dx + / (fj jV -k g){x)dWi{x) (A. 15) 

+\ [ gin. 

1 Jr 



(A.16) 
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We first discuss on the size of the terms in equations (|A.15j) and (|A.16j) . The first term in 
(IA.15j) can be bounded using the Cauchy-Schwarz inequality 

Q[ (fj,v*9)(x)fj,r,(x -ai)dx^J < n\\f jiri *g\\ 2 \\f j:ri \\ 2 = 0p( n ll/wjll 4 )- 



But observe that EiU E Wi (jo(fj, v * g)(x)dWi(? 
0. Then, the Jensen inequality implies 



n \\fj,v * d\\ 2 which does not converge to 



E"=l E Wi ( Jr #0i) Jq 1 fj, v {x - Ti)dWi{x) 



< E"=i Jr #( T i) E Wi Jq 1 fj, v {x - n)dWi(x) dn = O p (n\\f j ,X). 

Let us now study the terms derived from double products in equations (IA.15P and (IA.16j) . use 
first that 2\ab\ < (a 2 + b 2 ) to get EHi^,^ fi(fj,v*9){*) 



Jr 9(i~i) Jo f jiV (x - Ti)dWi(x) 



dr; 



The Cauchy-Schwarz and Jensen inequalities imply 

EiLi^ fi(fj,v*9)(x)dWi(x) J R g(n) /J f jjV (x - r^dW^x) 
= O p (n\\fjJ 3 ). 



dn 



At last, the Cauchy-Schwarz and Jensen inequalities on the remaining double-product term imply 
also 



= a 



E"=i Ii(fj,v * 9)(x)fj, v (x - a,i)dx f£(Jj, v *g)(x)dWi(x) 
n\fj 



All the above bounds enables us to write 



Li := logA n (/^,/ ) 

= E"=i fo(fj,v * 9)(x)fj, v (. x ~ a i) dx + L(fj,v*9)(x)dWi{x) 

+kEtik9(Ti) [sihnix-r^dWiix)] -mj 2 

-|Er=i (ll{k^9){x)dW i {x))\oM\knt)- 



In a similar way, we can also write 

L 2 := logA n (/ ji7?i ,/ ) = E"=i Jo *9)(x)fj,t,(x - ai)dx+ 
li(fj^- k 9)(x)dW i (x) + | ££ =1 J R g(n) /J - r f )dW f (s 



2 lUj,^ 
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For sake of simplicity, let us write h = fj „i — fj tV = 2-rjiipj^. The difference L = L2 — L\ can thus 
be decomposed as 



L = V/ {h*g)(x)[fj iTI (x-ai)- fj >v *g(x)]dx 
i=i Jo 

+ / (h-kg)(x)(f jjV -kg)(x)dx+ / (/i*0)(x)dWi(3 

i=1 Jo 



ifZ [ 9(n) [ f j>v i(x - Ti)dWi 
lit / ^) fT/i^-rOdWil 



(s) 



/j,^ I 



i x )) - n \\fj, v > *9\\ 



Tj,n**9\ 



1 
2 

2 



E( [\fj,r,*9)(x)dWi(a 



(A.17) 

(A.18) 

(A.19) 

(A.20) 

(A.21) 
(A.22) 

(A.23) 

(A.24) 
(A.25) 



Bound for (IA.17D : we use the classical Bennett's inequality (see e.g [25]) for a sum of inde- 
pendent and bounded variables. Define S = Ya=i Jo (h*9)( x ) ifj,v ( x ~ a «) ~~ fj,ri* g( x )]d> x - From 
Cauchy-Schwarz inequality, the random variables f (h * g)(x)fj tV (x — ai)dx are bounded by a 
constant b such that b = \\h * g\\\\fj !ri \\- Let v and c to be defined as 



(h*g)(x)fj tTt (x - ai)dx 



and c = 6/3. 



'0 

From the Cauchy Schwarz inequality, we have that v < n\\fj,ri\\ 2 \\h*g\\ 2 and as h = fj„i — fj >r) , 
by using our algebraic settings in Section lA.2,11 we observe that v — ► 0. The Bennett's inequality 
therefore implies that for any k > 0: 



P(|5| > k) < 2e 2 ( n ll^ll 2 H /l *5ll 2 + K ll /l *5llll^'?ll/ 3 ). 



From our algebraic settings in Section [A. 2. 11 one has thus that as n — > 00, the Pd^l > k) con- 
verges to 0. 

Bound for (1 A. 1911 A. 2011 A. 2111 A. 23j) : applying Lemma IA.5I (proved below) to the chi-square 



statistics in the expressions (|A.19|A.20p yields that for any n > ' 



\ E?=i Ik 9{n) Jq fjrf ( x ~ n)dWi(x) 



2e 



n||/^|| 4 + 2k||4 



, and 



'■\\fj,r,\\ 2 \ > K )< 2e" ll/ ^ l|4+2K|l/ ^ 11 " . Similarly we obtain for the chi-square statistics in (|A J 21JA i 23J 
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that for any K > 
and 

£2=1 ( Jo * 5) (x)cZWi (x) ) a - n 1 1 



£r=i(/o(^^)(xW(x)) 2 -n||/ jl7? *<7|| 2 ]|>/<) <2e n ll/i.i*»ll 4 + 2K H/i.'»*fll a , 



*0 I 



> K 



< 2e n ^3,n i * 5 " 4 + 2K Wfj,v l *fl| 2 _ It foii ows f rom the algebraic setting in Section EU that 
n ll/?,r?ll 4 — * and ||/j,r?|| 2 — ► 0, as well as n\\fj jV * <7|| 4 — > and ||/j>; * 5|| 2 — ► and the above 
probabilities converge to zero as n — > oo. 



Bound for (I A. 1811 A. 2211 A. 24j) : using the first term of (1A.18|) . simple computation shows that 
yields Y2=ifo(.(fjrf ~ fj,v) * 9)(x)(fj,v * g)(x)dx - fH/^i * S-|| 2 + %\\fj, v * gf = ~^\\h-kg\\ 2 , 
and we obtain from our algebraic settings that this term converges to since n||/i*g|| 2 — > 0. 
Moreover, the second term of (IA.18I) is the sum of n i.i.d centered normal variables and the 
Cirelson-Ibragimov-Sudakov's inequality [7] ensures that 



n „i 

V / (h*g){x)dWi{a 
i=i ^ 



> « < 2e 2n ll /l *5ll 2 , 



and thus the above probability goes to zero. 



Bound for (IA.25jl : from our algebraic settings in Section fA.2.11 it follows immediately that 
n\\fjj 3 ^0. 

Hence, by combining all the above bounds, it follows that we have shown that L2 — LI is 
the sum of various terms which all converge to zero in probability or that are larger than some 
negative constant with probability tending to one as n — ► +00, which completes the proof of 
Lemma IA.41 □ 

Lemma A. 5 Let g be a density function on M., and (Wi)j 6 /i... n \ be independent standard Brow- 
nian motions on [0, 1]. Then, for any f E L 2 ([0, 1]) and a > 0, 



—a 



8=1 



f(t-n)dWi(t) 



dn 



n 



> a < e 



77 



+ 2a 



dn-^WfW 2 . We use 



Proof of Lemma EH Consider Q n = \ Y2=\ Jo 1 /(* ~~ n)dWi{t) 

a Laplace transform technique to bound P(Cn > «)- For any > t > 0, we have by Markov's 

inequality 



P(Cn>«) < e 



-at 



77 



2 t n 



7-/2 / g(ri)( f 1 f(t-n)dWi(t)) dn 

e « wo / 



We apply now Jensen's inequality for the exponential function and the measure g{r)dr to obtain 

t/2 Qf 1 f{t-n)dWi{t) 



2 + n 



P(Cn>«) < e ^ 2 11/11 ^ TT / g( Ti )E 

t=i J * 



dn. 
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Remark that (^J^ f(t — Ti)dWi(t)j follows a chi-square distribution whose Laplace transform 
does not depend on t\ and thus 

-at--||/|| 2 t--log(l-t||/|| 2 ) 
P(Cn >a) < e 2 H,/I1 2 6V nJ n J . 

Let a = — and minimizing now the last bound with respect to t yield the optimal choice 
n / 2 \\f\\ 

Ti 

t* = jt^- With this choice, we obtain P(Cn > a) < exp(— [log(l + a) — a]). Now use the 

2 - 2 

classical bound log(l + — u< 2 {i+ u ) ' van d f° r au ^ > 0, to get P(C«. > a) < exp(^ x 2 ^ ) = 
-a 2 

exp( , — ), which completes the proof of the lemma. □ 

nil f 4 + 2a J 



A. 2. 4 A lower bound for the minimax risk 

By Lemma IA.3I and Lemma IA.41 it follows that there exists a constant C\ such that for all 
sufficiently large n, 



inf sup E 

In /6B» ( (A) 



2 A 2s 

> inf max E/. ||/„ - / iir ,|| 2 > Cin'wi 
/n »?e{±i} 2J 



which completes the proof of Theorem 14.11 □ 
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